Information about the test:
| question | description | MATH_group |
|---|---|---|
| A1 | properties of fractions | A |
| A2 | find intersection of lines | A |
| A3 | composition of functions | B |
| A4 | completing the square | A |
| A5 | trig double angle formula | A |
| A6 | trig wave function | A |
| A7 | graphical vector sum | B |
| A8 | compute angle between 3d vectors | A |
| A9 | simplify logs | A |
| A10 | identify graph of rational functions | B |
| A11 | summing arithmetic progression | A |
| A12 | find point with given slope | A |
| A13 | equation of tangent | A |
| A14 | find minimum gradient of cubic | B |
| A15 | find and classify stationary points of cubic | A |
| A16 | trig chain rule | A |
| A17 | chain rule | A |
| A18 | definite integral | A |
| A19 | area between curve and x-axis (in 2 parts) | B |
| A20 | product rule with given values | B |
Load the student scores for the test:
test_scores %>% skim()
| Name | Piped data |
| Number of rows | 3496 |
| Number of columns | 23 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 21 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| year | 0 | 1 | 7 | 7 | 0 | 4 | 0 |
| AnonID | 0 | 1 | 9 | 9 | 0 | 3472 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Total | 0 | 1 | 67.85 | 23.05 | 0 | 57.00 | 72.5 | 85 | 100 | ▁▁▃▇▇ |
| A1 | 0 | 1 | 4.25 | 1.51 | 0 | 5.00 | 5.0 | 5 | 5 | ▁▁▂▁▇ |
| A2 | 0 | 1 | 4.23 | 1.73 | 0 | 5.00 | 5.0 | 5 | 5 | ▁▁▁▁▇ |
| A3 | 0 | 1 | 3.20 | 2.40 | 0 | 0.00 | 5.0 | 5 | 5 | ▅▁▁▁▇ |
| A4 | 0 | 1 | 3.68 | 1.56 | 0 | 3.00 | 4.0 | 5 | 5 | ▂▂▂▃▇ |
| A5 | 0 | 1 | 3.97 | 2.02 | 0 | 5.00 | 5.0 | 5 | 5 | ▂▁▁▁▇ |
| A6 | 0 | 1 | 1.58 | 1.89 | 0 | 0.00 | 0.0 | 2 | 5 | ▇▅▁▁▃ |
| A7 | 0 | 1 | 3.45 | 2.24 | 0 | 0.00 | 5.0 | 5 | 5 | ▃▁▁▁▇ |
| A8 | 0 | 1 | 3.03 | 2.44 | 0 | 0.00 | 5.0 | 5 | 5 | ▅▁▁▁▇ |
| A9 | 0 | 1 | 4.16 | 1.87 | 0 | 5.00 | 5.0 | 5 | 5 | ▂▁▁▁▇ |
| A10 | 0 | 1 | 3.14 | 2.08 | 0 | 2.50 | 5.0 | 5 | 5 | ▃▁▃▁▇ |
| A11 | 0 | 1 | 3.89 | 1.67 | 0 | 2.50 | 5.0 | 5 | 5 | ▁▁▂▁▇ |
| A12 | 0 | 1 | 3.78 | 2.06 | 0 | 4.00 | 5.0 | 5 | 5 | ▃▁▁▁▇ |
| A13 | 0 | 1 | 3.53 | 2.05 | 0 | 2.00 | 5.0 | 5 | 5 | ▂▂▁▁▇ |
| A14 | 0 | 1 | 2.50 | 1.99 | 0 | 0.00 | 2.0 | 5 | 5 | ▇▇▁▁▇ |
| A15 | 0 | 1 | 3.35 | 2.13 | 0 | 1.25 | 5.0 | 5 | 5 | ▃▁▂▁▇ |
| A16 | 0 | 1 | 4.23 | 1.81 | 0 | 5.00 | 5.0 | 5 | 5 | ▂▁▁▁▇ |
| A17 | 0 | 1 | 4.36 | 1.67 | 0 | 5.00 | 5.0 | 5 | 5 | ▁▁▁▁▇ |
| A18 | 0 | 1 | 3.33 | 2.36 | 0 | 0.00 | 5.0 | 5 | 5 | ▅▁▁▁▇ |
| A19 | 0 | 1 | 2.33 | 2.49 | 0 | 0.00 | 0.0 | 5 | 5 | ▇▁▁▁▇ |
| A20 | 0 | 1 | 1.86 | 2.42 | 0 | 0.00 | 0.0 | 5 | 5 | ▇▁▁▁▅ |
Included in the data are many abandoned attempts, where students have apparently not engaged with most questions (e.g. after the first few on the test). Unfortunately the data only includes a score for each item, and not whether it was actually answered, so to try to remove these “non-serious” attempts, we use a process of eliminating based on scores in the latter half of the test:
For students who took the test more than once, consider the attempt with the highest scores only and remove the others;
Eliminate the students who scored three or more zeros in the 5 easiest questions in the second-half of the test; and
Add the students scoring more than 30 marks in total back to the sample.
test_scores <- test_scores_unfiltered %>%
group_by(AnonID) %>%
slice_max(Total, n = 1) %>%
ungroup() %>%
rowwise() %>%
mutate(zeros_in_easiest_5 = sum(A11==0, A12==0, A13==0, A16==0, A17==0)) %>%
filter(zeros_in_easiest_5 <= 2 | Total >= 30) %>%
select(-zeros_in_easiest_5) %>%
ungroup()
bind_rows(
"unfiltered" = test_scores_unfiltered %>% select(Total),
"filtered" = test_scores %>% select(Total),
.id = "dataset"
) %>%
group_by(dataset) %>%
# add n's to the facet titles
mutate(dataset = str_glue("{dataset} (n={n()})") %>% as_factor()) %>%
# flip them so that filtered appears on the right
mutate(dataset = fct_rev(dataset)) %>%
ggplot(aes(x = Total)) +
geom_histogram() +
facet_wrap(vars(dataset)) +
theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Distribution of scores, filtered vs unfiltered
Summary of each cohort:
test_scores_summary <- test_scores %>%
group_by(year) %>%
summarise(
n = n(),
mean = mean(Total),
sd = sd(Total),
median = median(Total)
)
test_scores_summary %>%
gt() %>%
fmt_number(columns = c("mean", "sd"), decimals = 2) %>%
data_color(
columns = c("n"),
colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
)
| year | n | mean | sd | median |
|---|---|---|---|---|
| 2013-14 | 816 | 69.77 | 16.94 | 71.0 |
| 2014-15 | 924 | 70.30 | 16.71 | 71.5 |
| 2015-16 | 721 | 73.28 | 17.05 | 76.0 |
| 2016-17 | 760 | 77.25 | 15.31 | 79.5 |
test_scores %>%
ggplot(aes(x = Total)) +
ggridges::geom_density_ridges(aes(y = year, fill = year)) +
#facet_grid(cols = vars(year)) +
theme_minimal()
## Picking joint bandwidth of 3.9
p1 <- test_scores %>%
ggplot(aes(x = Total)) +
geom_histogram(binwidth = 5) +
#scale_x_continuous(limits = c(0,100), breaks = c(0, 50, 100)) +
facet_grid(cols = vars(year)) +
theme_minimal() +
labs(x = "Total score (out of 100)",
y = "Number of students",
title = "Edinburgh MDT") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank())
p2 <- test_scores_summary %>%
mutate(
n = str_glue("{n}"),
mean = str_glue("{round(mean, digits = 1)}"),
sd = str_glue("{round(sd, digits = 1)}"),
median = str_glue("{median}")
) %>%
pivot_longer(c(n, mean, sd, median), names_to = "layer", values_to = "label") %>%
mutate(layer = fct_relevel(layer, c("n", "sd", "mean", "median")) %>% fct_rev()) %>%
ggplot(aes(x = 80, y = layer, label = label)) +
geom_text(size = 10 * 5/14, hjust = 1) +
scale_x_continuous(limits = c(0,100)) +
facet_grid(cols = vars(year)) +
labs(y = "", x = NULL) +
scale_y_discrete(labels = c("n" = "N", "mean" = "Mean", "median" = "Median")) +
theme_minimal() +
theme(axis.line = element_blank(), axis.ticks = element_blank(), axis.text.x = element_blank(),
panel.grid = element_blank(), strip.text = element_blank())
p1 / p2 + plot_layout(heights = c(5, 2.5))
ggsave("output/uoe_pre_data-summary.pdf", units = "cm", width = 12, height = 8)
Mean and standard deviation for each item:
test_scores %>%
select(-AnonID, -Total) %>%
group_by(year) %>%
skim_without_charts() %>%
select(-contains("character."), -contains("numeric.p"), -skim_type) %>%
rename(complete = complete_rate) %>%
# make the table wider, i.e. with separate columns for each year's results, with the year at the start of each column name
pivot_wider(names_from = year, values_from = -c(skim_variable, year), names_glue = "{year}__{.value}") %>%
# put the columns in order by year
select(sort(names(.))) %>%
select(skim_variable, everything()) %>%
# use GT to make the table look nice
gt(rowname_col = "skim_variable") %>%
# group the columns from each year
tab_spanner_delim(delim = "__") %>%
fmt_number(columns = contains("numeric"), decimals = 2) %>%
fmt_percent(columns = contains("complete"), decimals = 0) %>%
# change all the numeric.mean and numeric.sd column names to Mean and SD
cols_label(
.list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.mean"), label = "Mean") %>% deframe()
) %>%
cols_label(
.list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.sd"), label = "SD") %>% deframe()
) %>%
data_color(
columns = contains("numeric.mean"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
)
| 2013-14 | 2014-15 | 2015-16 | 2016-17 | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | |
| A1 | 100% | 0 | 4.32 | 1.41 | 100% | 0 | 4.40 | 1.32 | 100% | 0 | 4.45 | 1.29 | 100% | 0 | 4.52 | 1.16 |
| A2 | 100% | 0 | 4.31 | 1.65 | 100% | 0 | 4.41 | 1.53 | 100% | 0 | 4.49 | 1.45 | 100% | 0 | 4.54 | 1.38 |
| A3 | 100% | 0 | 3.33 | 2.36 | 100% | 0 | 3.25 | 2.39 | 100% | 0 | 3.39 | 2.34 | 100% | 0 | 3.61 | 2.24 |
| A4 | 100% | 0 | 3.72 | 1.43 | 100% | 0 | 3.83 | 1.33 | 100% | 0 | 3.91 | 1.38 | 100% | 0 | 4.12 | 1.22 |
| A5 | 100% | 0 | 4.05 | 1.96 | 100% | 0 | 4.19 | 1.84 | 100% | 0 | 4.28 | 1.76 | 100% | 0 | 4.32 | 1.71 |
| A6 | 100% | 0 | 1.48 | 1.85 | 100% | 0 | 1.53 | 1.85 | 100% | 0 | 1.82 | 1.96 | 100% | 0 | 2.03 | 1.97 |
| A7 | 100% | 0 | 3.58 | 2.17 | 100% | 0 | 3.47 | 2.22 | 100% | 0 | 3.70 | 2.12 | 100% | 0 | 4.14 | 1.84 |
| A8 | 100% | 0 | 2.97 | 2.46 | 100% | 0 | 3.04 | 2.44 | 100% | 0 | 3.33 | 2.36 | 100% | 0 | 3.70 | 2.19 |
| A9 | 100% | 0 | 4.31 | 1.73 | 100% | 0 | 4.46 | 1.55 | 100% | 0 | 4.45 | 1.57 | 100% | 0 | 4.65 | 1.27 |
| A10 | 100% | 0 | 3.19 | 2.03 | 100% | 0 | 3.29 | 2.00 | 100% | 0 | 3.34 | 1.97 | 100% | 0 | 3.62 | 1.89 |
| A11 | 100% | 0 | 4.05 | 1.45 | 100% | 0 | 4.16 | 1.40 | 100% | 0 | 4.13 | 1.42 | 100% | 0 | 4.25 | 1.35 |
| A12 | 100% | 0 | 3.99 | 1.90 | 100% | 0 | 3.93 | 1.95 | 100% | 0 | 4.11 | 1.81 | 100% | 0 | 4.29 | 1.65 |
| A13 | 100% | 0 | 3.69 | 1.95 | 100% | 0 | 3.68 | 1.94 | 100% | 0 | 3.83 | 1.81 | 100% | 0 | 4.06 | 1.71 |
| A14 | 100% | 0 | 2.44 | 1.93 | 100% | 0 | 2.54 | 1.93 | 100% | 0 | 2.85 | 1.92 | 100% | 0 | 3.03 | 1.90 |
| A15 | 100% | 0 | 3.57 | 2.00 | 100% | 0 | 3.51 | 2.02 | 100% | 0 | 3.67 | 1.95 | 100% | 0 | 3.68 | 2.02 |
| A16 | 100% | 0 | 4.51 | 1.49 | 100% | 0 | 4.49 | 1.52 | 100% | 0 | 4.55 | 1.43 | 100% | 0 | 4.66 | 1.26 |
| A17 | 100% | 0 | 4.69 | 1.21 | 100% | 0 | 4.59 | 1.37 | 100% | 0 | 4.74 | 1.12 | 100% | 0 | 4.75 | 1.09 |
| A18 | 100% | 0 | 3.48 | 2.30 | 100% | 0 | 3.45 | 2.31 | 100% | 0 | 3.65 | 2.22 | 100% | 0 | 3.83 | 2.12 |
| A19 | 100% | 0 | 2.28 | 2.49 | 100% | 0 | 2.26 | 2.49 | 100% | 0 | 2.61 | 2.50 | 100% | 0 | 2.99 | 2.45 |
| A20 | 100% | 0 | 1.83 | 2.41 | 100% | 0 | 1.82 | 2.41 | 100% | 0 | 2.00 | 2.45 | 100% | 0 | 2.46 | 2.50 |
Before applying IRT, we should check that the data satisfies the assumptions needed by the model. In particular, to use a 1-dimensional IRT model, we should have some evidence of unidimensionality in the test scores.
If the test is unidimensional then we would expect student scores on pairs of items to be correlated.
This plot shows the correlations between scores on each pair of items:
item_scores <- test_scores %>%
select(starts_with("A"), -AnonID)
cor_ci <- psych::corCi(item_scores, plot = FALSE)
psych::cor.plot.upperLowerCi(cor_ci)
Checking for correlations that are not significantly different from 0, there are none:
cor_ci$ci %>%
as_tibble(rownames = "corr") %>%
filter(p > 0.05) %>%
arrange(-p) %>%
select(-contains(".e")) %>%
gt() %>%
fmt_number(columns = 2:4, decimals = 3)
| corr | lower | upper | p |
|---|---|---|---|
| A3-A8 | −0.007 | 0.065 | 0.111 |
| A1-A17 | −0.005 | 0.078 | 0.086 |
The overall picture is that the item scores are well correlated with each other.
structure <- check_factorstructure(item_scores)
n <- n_factors(item_scores)
The choice of 1 dimensions is supported by 6 (31.58%) methods out of 19 (t, p, Acceleration factor, Scree (R2), VSS complexity 1, Velicer’s MAP).
plot(n)
summary(n) %>% gt()
| n_Factors | n_Methods |
|---|---|
| 1 | 6 |
| 2 | 2 |
| 3 | 2 |
| 4 | 5 |
| 12 | 1 |
| 19 | 3 |
#n %>% tibble() %>% gt()
fa.parallel(item_scores, fa = "fa")
## Parallel analysis suggests that the number of factors = 5 and the number of components = NA
We use the factanal function to fit a 1-factor
model.
Note that this function cannot handle missing data, so any
NA scores must be set to 0 for this
analysis.
fitfact <- factanal(item_scores,
factors = 1,
rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
##
## Call:
## factanal(x = item_scores, factors = 1, rotation = "varimax")
##
## Uniquenesses:
## A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16
## 0.90 0.96 0.83 0.73 0.89 0.84 0.80 0.94 0.89 0.83 0.92 0.84 0.81 0.63 0.89 0.85
## A17 A18 A19 A20
## 0.90 0.85 0.74 0.68
##
## Loadings:
## [1] 0.52 0.61 0.51 0.57 0.31 0.41 0.33 0.40 0.45 0.34 0.41 0.40
## [16] 0.44 0.33 0.39 0.32 0.39
##
## Factor1
## SS loadings 3.29
## Proportion Var 0.16
##
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 1276.9 on 170 degrees of freedom.
## The p-value is 7.86e-169
load <- tidy(fitfact)
load %>%
select(question = variable, factor_loading = fl1) %>%
left_join(item_info, by = "question") %>%
ggplot(aes(x = factor_loading, y = 0, colour = MATH_group)) +
geom_point() +
geom_label_repel(aes(label = question), show.legend = FALSE) +
scale_colour_manual("MATH group", values = MATH_colours) +
scale_y_discrete() +
labs(x = "Factor 1", y = NULL,
title = "Standardised Loadings",
subtitle = "Based on 1-factor solution") +
theme_minimal()
## Warning: ggrepel: 2 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
load %>%
select(question = variable, factor_loading = fl1) %>%
left_join(item_info, by = "question") %>%
arrange(-factor_loading) %>%
gt() %>%
data_color(
columns = contains("factor"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
) %>%
data_color(
columns = contains("MATH"),
colors = MATH_colours
)
| question | factor_loading | description | MATH_group |
|---|---|---|---|
| A14 | 0.6113509 | find minimum gradient of cubic | B |
| A20 | 0.5654389 | product rule with given values | B |
| A4 | 0.5179007 | completing the square | A |
| A19 | 0.5095020 | area between curve and x-axis (in 2 parts) | B |
| A7 | 0.4479689 | graphical vector sum | B |
| A13 | 0.4363393 | equation of tangent | A |
| A3 | 0.4144852 | composition of functions | B |
| A10 | 0.4129131 | identify graph of rational functions | B |
| A6 | 0.4005742 | trig wave function | A |
| A12 | 0.3980326 | find point with given slope | A |
| A16 | 0.3873155 | trig chain rule | A |
| A18 | 0.3870384 | definite integral | A |
| A9 | 0.3370688 | simplify logs | A |
| A15 | 0.3346979 | find and classify stationary points of cubic | A |
| A5 | 0.3339280 | trig double angle formula | A |
| A17 | 0.3235231 | chain rule | A |
| A1 | 0.3090752 | properties of fractions | A |
| A11 | 0.2909308 | summing arithmetic progression | A |
| A8 | 0.2348027 | compute angle between 3d vectors | A |
| A2 | 0.2082143 | find intersection of lines | A |
It is striking here that the MATH Group B questions are those that load most strongly onto this factor.
Here we also investigate the 4-factor solution, to see whether these factors are interpretable.
fitfact2 <- factanal(item_scores, factors = 4, rotation = "varimax")
print(fitfact2, digits = 2, cutoff = 0.3, sort = TRUE)
##
## Call:
## factanal(x = item_scores, factors = 4, rotation = "varimax")
##
## Uniquenesses:
## A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16
## 0.88 0.95 0.70 0.69 0.86 0.75 0.77 0.81 0.87 0.80 0.90 0.81 0.69 0.58 0.87 0.59
## A17 A18 A19 A20
## 0.45 0.83 0.75 0.66
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4
## A3 0.54
## A16 0.60
## A17 0.73
## A13 0.50
## A1 0.33
## A2
## A4 0.48
## A5
## A6 0.42
## A7 0.37
## A8 0.42
## A9
## A10 0.40
## A11
## A12 0.32
## A14 0.41 0.46
## A15
## A18
## A19 0.37
## A20 0.48
##
## Factor1 Factor2 Factor3 Factor4
## SS loadings 1.82 1.14 1.01 0.80
## Proportion Var 0.09 0.06 0.05 0.04
## Cumulative Var 0.09 0.15 0.20 0.24
##
## Test of the hypothesis that 4 factors are sufficient.
## The chi square statistic is 196.75 on 116 degrees of freedom.
## The p-value is 4.23e-06
load2 <- tidy(fitfact2)
load2_plot <- load2 %>%
rename(question = variable) %>%
left_join(item_info, by = "question") %>%
ggplot(aes(x = fl1, y = fl2, colour = MATH_group, shape = MATH_group)) +
geom_point() +
geom_text_repel(aes(label = question), show.legend = FALSE, alpha = 0.6) +
labs(
x = "Factor 1 (of 4)",
y = "Factor 2 (of 4)"
) +
scale_colour_manual("MATH group", values = MATH_colours[1:2]) +
scale_shape_manual(name = "MATH group", values = c(19, 17)) +
theme_minimal()
load2_plot +
labs(
title = "Standardised Loadings",
subtitle = "Showing the first 2 factors of the 4-factor model"
)
ggsave("output/uoe_pre_4factor.pdf", units = "cm", width = 14, height = 10, dpi = 300,
plot = load2_plot)
main_factors <- load2 %>%
# mutate(factorNone = 0.4) %>% # add this to set the main factor to "None" where all loadings are below 0.4
pivot_longer(names_to = "factor",
cols = contains("fl")) %>%
mutate(value_abs = abs(value)) %>%
group_by(variable) %>%
top_n(1, value_abs) %>%
ungroup() %>%
transmute(main_factor = factor, variable)
load2 %>%
select(-uniqueness) %>%
# add the info about which is the main factor
left_join(main_factors, by = "variable") %>%
left_join(item_info %>% select(variable = question, description, MATH_group), by = "variable") %>%
arrange(main_factor) %>%
select(main_factor, everything()) %>%
# arrange adjectives by descending loading on main factor
rowwise() %>%
mutate(max_loading = max(abs(c_across(starts_with("fl"))))) %>%
group_by(main_factor) %>%
arrange(-max_loading, .by_group = TRUE) %>%
select(-max_loading) %>%
# sort out the presentation
rename("Main Factor" = main_factor,
"Question" = variable) %>%
mutate_at(
vars(starts_with("fl")),
~ cell_spec(round(., digits = 3), bold = if_else(abs(.) > 0.4, T, F))
) %>%
kable(booktabs = T, escape = F, longtable = T) %>%
kableExtra::collapse_rows(columns = 1, valign = "top") %>%
kableExtra::kable_styling(latex_options = c("repeat_header"))
| Main Factor | Question | fl1 | fl2 | fl3 | fl4 | description | MATH_group |
|---|---|---|---|---|---|---|---|
| fl1 | A3 | 0.539 | 0.035 | 0.058 | 0.031 | composition of functions | B |
| fl1 | A20 | 0.483 | 0.144 | 0.258 | 0.13 | product rule with given values | B |
| fl1 | A4 | 0.483 | 0.049 | 0.185 | 0.197 | completing the square | A |
| fl1 | A10 | 0.399 | 0.086 | 0.159 | 0.066 | identify graph of rational functions | B |
| fl1 | A7 | 0.373 | 0.013 | 0.248 | 0.16 | graphical vector sum | B |
| fl1 | A19 | 0.37 | 0.177 | 0.244 | 0.163 | area between curve and x-axis (in 2 parts) | B |
| fl1 | A1 | 0.331 | 0.025 | 0.084 | 0.079 | properties of fractions | A |
| fl1 | A9 | 0.247 | 0.151 | 0.072 | 0.202 | simplify logs | A |
| fl1 | A2 | 0.166 | 0.03 | 0.074 | 0.114 | find intersection of lines | A |
| fl2 | A17 | 0.028 | 0.727 | 0.126 | 0.056 | chain rule | A |
| fl2 | A16 | 0.108 | 0.598 | 0.135 | 0.16 | trig chain rule | A |
| fl2 | A18 | 0.191 | 0.244 | 0.162 | 0.224 | definite integral | A |
| fl3 | A13 | 0.171 | 0.127 | 0.504 | 0.083 | equation of tangent | A |
| fl3 | A14 | 0.406 | 0.122 | 0.46 | 0.162 | find minimum gradient of cubic | B |
| fl3 | A12 | 0.173 | 0.103 | 0.323 | 0.207 | find point with given slope | A |
| fl3 | A15 | 0.127 | 0.179 | 0.261 | 0.135 | find and classify stationary points of cubic | A |
| fl4 | A8 | 0.022 | 0.062 | 0.083 | 0.419 | compute angle between 3d vectors | A |
| fl4 | A6 | 0.195 | 0.074 | 0.171 | 0.415 | trig wave function | A |
| fl4 | A5 | 0.24 | 0.086 | 0.08 | 0.254 | trig double angle formula | A |
| fl4 | A11 | 0.21 | 0.113 | 0.045 | 0.212 | summing arithmetic progression | A |
The first factor is dominated by questions that had previously been identified as MATH Group B, i.e. those that are somehow “non-standard” – either requiring students to recognise that a particular rule/procedure is applicable before applying it, or to apply it in an unusual way. This factor also includes Group A questions on “pre-calculus” topics (such as fractions, logarithms and trigonometry) that students had perhaps not explicitly practiced most recently.
The second factor is dominated by the two chain rule questions (A16 and A17), along with A18 which is a routine definite integral, suggesting this factor is related to routine calculus computations.
The third factor seems to be based on applying calculus techniques to cubic and quadratic curves, e.g. to find tangent lines or stationary points.
The fourth factor is dominated by the only two questions that require the use of a calculator (to compute trigonometric functions), but more generally seems to be based on non-calculus skills (vectors, trig, sequences).
The mirt implementation of the graded partial credit
model (gpmc) requires that the partial marks are
consecutive integers. We therefore need to work around this by adjusting
our scores into that form (e.g. replacing scores of 0, 2.5, 5 with 1, 2,
3), while keeping track of the true scores attached to each mark level
so that we can properly compute expected scores later on.
# Determine the mark levels for each item
mark_levels <- item_scores %>%
pivot_longer(everything(), names_to = "item", values_to = "score") %>%
distinct() %>%
arrange(parse_number(item), score) %>%
group_by(item) %>%
mutate(order = row_number()) %>%
# Note that the convention used by mirt is for items that have only 2 levels (i.e. 0 marks or full marks),
# the columns are P.0 and P.1, while other items are indexed from 1, i.e. P.1, P.2, ...
# https://github.com/philchalmers/mirt/blob/accd2383b9a4d17a4cab269717ce98434900b62c/R/probtrace.R#L57
mutate(level = case_when(
max(order) == 2 ~ order - 1,
TRUE ~ order * 1.0
)) %>%
mutate(levelname = paste0(item, ".P.", level))
# Use the mark_levels table to replace scores with levels
# (first pivot the data to long form, make the replacement, then pivot back to wide again)
item_scores_levelled <- item_scores %>%
# temporarily add row identifiers
mutate(row = row_number()) %>%
pivot_longer(cols = -row, names_to = "item", values_to = "score") %>%
left_join(mark_levels %>% select(item, score, level), by = c("item", "score")) %>%
select(-score) %>%
pivot_wider(names_from = "item", values_from = "level") %>%
select(-row)
fit_gpcm <- mirt(
data = item_scores_levelled, # just the columns with question scores
model = 1, # number of factors to extract
itemtype = "gpcm", # generalised partial credit model
SE = TRUE # estimate standard errors
)
##
Iteration: 1, Log-Lik: -50895.842, Max-Change: 6.51151
Iteration: 2, Log-Lik: -47143.856, Max-Change: 1.39679
Iteration: 3, Log-Lik: -46111.495, Max-Change: 3.70482
Iteration: 4, Log-Lik: -45840.485, Max-Change: 0.91195
Iteration: 5, Log-Lik: -45689.338, Max-Change: 1.04099
Iteration: 6, Log-Lik: -45627.116, Max-Change: 0.33955
Iteration: 7, Log-Lik: -45599.220, Max-Change: 0.16658
Iteration: 8, Log-Lik: -45588.708, Max-Change: 0.05793
Iteration: 9, Log-Lik: -45583.407, Max-Change: 0.06734
Iteration: 10, Log-Lik: -45580.515, Max-Change: 0.04218
Iteration: 11, Log-Lik: -45578.761, Max-Change: 0.02592
Iteration: 12, Log-Lik: -45577.647, Max-Change: 0.02897
Iteration: 13, Log-Lik: -45576.671, Max-Change: 0.01668
Iteration: 14, Log-Lik: -45576.152, Max-Change: 0.00766
Iteration: 15, Log-Lik: -45575.863, Max-Change: 0.00568
Iteration: 16, Log-Lik: -45575.430, Max-Change: 0.00230
Iteration: 17, Log-Lik: -45575.403, Max-Change: 0.00567
Iteration: 18, Log-Lik: -45575.379, Max-Change: 0.00174
Iteration: 19, Log-Lik: -45575.374, Max-Change: 0.00164
Iteration: 20, Log-Lik: -45575.363, Max-Change: 0.00188
Iteration: 21, Log-Lik: -45575.354, Max-Change: 0.00058
Iteration: 22, Log-Lik: -45575.353, Max-Change: 0.00049
Iteration: 23, Log-Lik: -45575.352, Max-Change: 0.00118
Iteration: 24, Log-Lik: -45575.346, Max-Change: 0.00065
Iteration: 25, Log-Lik: -45575.344, Max-Change: 0.00017
Iteration: 26, Log-Lik: -45575.344, Max-Change: 0.00016
Iteration: 27, Log-Lik: -45575.343, Max-Change: 0.00009
##
## Calculating information matrix...
We compute Yen’s \(Q_3\) (1984) to check for any dependence between items after controlling for \(\theta\). This gives a score for each pair of items, with scores above 0.2 regarded as problematic (see DeMars, p. 48).
residuals %>% as.matrix() %>%
corrplot::corrplot(type = "upper")
This shows that most item pairs are independent, with only one pair showing cause for concern:
residuals %>%
rownames_to_column(var = "item1") %>%
as_tibble() %>%
pivot_longer(cols = starts_with("A"), names_to = "item2", values_to = "Q3_score") %>%
filter(abs(Q3_score) > 0.2) %>%
filter(parse_number(item1) < parse_number(item2)) %>%
gt()
| item1 | item2 | Q3_score |
|---|---|---|
| A16 | A17 | 0.323394 |
Items A16 and A17 are on the chain rule (e.g. differentiating \(\cos(4x^2+5)\) and \((3x^2-8)^3\) respectively), so it is perhaps unsurprising that students’ performance on these items is not entirely independent.
Given that this violation of the local independence assumption is very mild, we proceed using this model.
We augment the data with estimated abilities for each student, using
mirt’s fscores() function.
test_scores_with_ability <- test_scores %>%
mutate(F1 = fscores(fit_gpcm, method = "EAP"))
Next, we extract the IRT parameters.
coefs_gpcm <- coef(fit_gpcm, IRTpars = TRUE)
We use a modified version of the tidy_mirt_coeffs
function to get all the parameter estimates into a tidy table:
tidy_mirt_coefs <- function(x){
x %>%
# melt the list element
melt() %>%
# convert to a tibble
as_tibble() %>%
# convert factors to characters
mutate(across(where(is.factor), as.character)) %>%
# only focus on rows where X2 is a, or starts with b (the parameters in the GPCM)
filter(X2 == "a" | str_detect(X2, "^b")) %>%
# in X1, relabel par (parameter) as est (estimate)
mutate(X1 = if_else(X1 == "par", "est", X1)) %>%
# turn into a wider data frame
pivot_wider(names_from = X1, values_from = value) %>%
rename(par = X2)
}
# use head(., -1) to remove the last element, `GroupPars`, which does not correspond to a question
tidy_gpcm <- map_dfr(head(coefs_gpcm, -1), tidy_mirt_coefs, .id = "Question")
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
## Warning in type.convert.default(X[[i]], ...): 'as.is' should be specified by the
## caller; using TRUE
tidy_gpcm %>%
filter(par == "a") %>%
select(-par) %>%
rename_with(.fn = ~ paste0("a_", .x), .cols = -Question) %>%
left_join(
tidy_gpcm %>%
filter(str_detect(par, "^b")),
by = "Question"
) %>%
gt(groupname_col = "Question") %>%
fmt_number(columns = contains("est|_"), decimals = 3) %>%
data_color(
columns = contains("a_"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
) %>%
data_color(
columns = c("est", "CI_2.5", "CI_97.5"),
colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
) %>%
tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
tab_spanner(label = "Difficulty", columns = c("par", "est", "CI_2.5", "CI_97.5"))
| Discrimination | Difficulty | |||||
|---|---|---|---|---|---|---|
| a_est | a_CI_2.5 | a_CI_97.5 | par | est | CI_2.5 | CI_97.5 |
| A1 | ||||||
| 0.6739273 | 0.5766091 | 0.7712455 | b1 | -2.37887310 | -2.70677829 | -2.05096790 |
| 0.6739273 | 0.5766091 | 0.7712455 | b2 | -2.77145860 | -3.15954563 | -2.38337157 |
| A2 | ||||||
| 0.3663307 | 0.2934319 | 0.4392295 | b1 | 2.05108376 | 1.24385401 | 2.85831351 |
| 0.3663307 | 0.2934319 | 0.4392295 | b2 | -8.62067669 | -10.37049167 | -6.87086170 |
| A3 | ||||||
| 1.0870453 | 0.9675230 | 1.2065675 | b1 | -0.84539828 | -0.94886622 | -0.74193034 |
| A4 | ||||||
| 0.2734782 | 0.2460363 | 0.3009201 | b1 | 11.73622013 | 7.30832377 | 16.16411649 |
| 0.2734782 | 0.2460363 | 0.3009201 | b2 | -14.41982341 | -18.82895390 | -10.01069291 |
| 0.2734782 | 0.2460363 | 0.3009201 | b3 | 3.21551287 | 1.76923276 | 4.66179298 |
| 0.2734782 | 0.2460363 | 0.3009201 | b4 | -7.39155856 | -8.82579336 | -5.95732375 |
| 0.2734782 | 0.2460363 | 0.3009201 | b5 | 2.20536928 | 1.28228743 | 3.12845113 |
| 0.2734782 | 0.2460363 | 0.3009201 | b6 | -4.92286032 | -5.84277186 | -4.00294877 |
| 0.2734782 | 0.2460363 | 0.3009201 | b7 | 1.56215355 | 0.89309330 | 2.23121379 |
| 0.2734782 | 0.2460363 | 0.3009201 | b8 | -3.60160285 | -4.28569144 | -2.91751425 |
| 0.2734782 | 0.2460363 | 0.3009201 | b9 | 6.27564656 | 5.27623198 | 7.27506114 |
| 0.2734782 | 0.2460363 | 0.3009201 | b10 | -9.54842246 | -10.80070414 | -8.29614078 |
| A5 | ||||||
| 1.0451735 | 0.9039862 | 1.1863607 | b1 | -1.90850902 | -2.11948357 | -1.69753446 |
| A6 | ||||||
| 0.4263958 | 0.3782147 | 0.4745769 | b1 | 0.75043015 | 0.52648747 | 0.97437283 |
| 0.4263958 | 0.3782147 | 0.4745769 | b2 | 9.20247071 | 7.79017178 | 10.61476964 |
| 0.4263958 | 0.3782147 | 0.4745769 | b3 | -7.47434964 | -8.85132818 | -6.09737111 |
| A7 | ||||||
| 0.6618140 | 0.5902292 | 0.7333989 | b1 | 1.71629156 | 1.34094390 | 2.09163921 |
| 0.6618140 | 0.5902292 | 0.7333989 | b2 | -3.93569909 | -4.42025082 | -3.45114736 |
| A8 | ||||||
| 0.5232691 | 0.4326544 | 0.6138839 | b1 | -1.24674100 | -1.48771035 | -1.00577165 |
| A9 | ||||||
| 1.2731825 | 1.0950872 | 1.4512779 | b1 | -2.08927226 | -2.30486362 | -1.87368089 |
| A10 | ||||||
| 0.6375293 | 0.5675827 | 0.7074758 | b1 | -0.83165992 | -1.00159912 | -0.66172072 |
| 0.6375293 | 0.5675827 | 0.7074758 | b2 | -1.09423875 | -1.28487792 | -0.90359958 |
| A11 | ||||||
| 0.2874306 | 0.2455924 | 0.3292689 | b1 | -1.83148685 | -2.73156569 | -0.93140801 |
| 0.2874306 | 0.2455924 | 0.3292689 | b2 | -5.93170797 | -6.91126447 | -4.95215147 |
| 0.2874306 | 0.2455924 | 0.3292689 | b3 | 9.59073848 | 7.75997286 | 11.42150409 |
| 0.2874306 | 0.2455924 | 0.3292689 | b4 | -14.01994221 | -16.34217642 | -11.69770801 |
| A12 | ||||||
| 0.4469893 | 0.3938993 | 0.5000792 | b1 | 0.92339830 | 0.47109722 | 1.37569938 |
| 0.4469893 | 0.3938993 | 0.5000792 | b2 | 0.82752184 | 0.28810586 | 1.36693783 |
| 0.4469893 | 0.3938993 | 0.5000792 | b3 | -6.81714207 | -7.72532434 | -5.90895981 |
| A13 | ||||||
| 0.4498875 | 0.4002035 | 0.4995716 | b1 | -0.92305819 | -1.21251257 | -0.63360381 |
| 0.4498875 | 0.4002035 | 0.4995716 | b2 | 4.44772263 | 3.63907005 | 5.25637520 |
| 0.4498875 | 0.4002035 | 0.4995716 | b3 | -7.94643800 | -9.00376537 | -6.88911062 |
| A14 | ||||||
| 0.5058676 | 0.4592957 | 0.5524394 | b1 | 1.59690368 | 1.20277845 | 1.99102890 |
| 0.5058676 | 0.4592957 | 0.5524394 | b2 | -3.78725933 | -4.21495258 | -3.35956608 |
| 0.5058676 | 0.4592957 | 0.5524394 | b3 | 8.73895686 | 7.38014808 | 10.09776565 |
| 0.5058676 | 0.4592957 | 0.5524394 | b4 | -3.77332206 | -4.98361643 | -2.56302770 |
| 0.5058676 | 0.4592957 | 0.5524394 | b5 | -4.33839831 | -4.96075350 | -3.71604312 |
| A15 | ||||||
| 0.2315387 | 0.2010493 | 0.2620280 | b1 | 8.34058532 | 6.77169763 | 9.90947301 |
| 0.2315387 | 0.2010493 | 0.2620280 | b2 | -6.58343508 | -7.89985746 | -5.26701269 |
| 0.2315387 | 0.2010493 | 0.2620280 | b3 | 3.11283156 | 2.20383750 | 4.02182563 |
| 0.2315387 | 0.2010493 | 0.2620280 | b4 | -10.63256038 | -12.21589986 | -9.04922090 |
| A16 | ||||||
| 1.7636035 | 1.5204396 | 2.0067673 | b1 | -1.87625943 | -2.03473886 | -1.71778000 |
| A17 | ||||||
| 1.6549288 | 1.3943388 | 1.9155188 | b1 | -2.22363608 | -2.44350649 | -2.00376566 |
| A18 | ||||||
| 0.9999717 | 0.8822981 | 1.1176452 | b1 | -1.12920867 | -1.25973364 | -0.99868370 |
| A19 | ||||||
| 1.3764839 | 1.2429462 | 1.5100216 | b1 | -0.02088671 | -0.08918441 | 0.04741098 |
| A20 | ||||||
| 1.8142982 | 1.6389921 | 1.9896043 | b1 | 0.33216145 | 0.26971198 | 0.39461092 |
tidy_gpcm %>%
write_csv("output/uoe_pre_gpcm-results.csv")
theta <- seq(-6, 6, by=0.05)
info_matrix <- testinfo(fit_gpcm, theta, individual = TRUE)
colnames(info_matrix) <- item_info %>% pull(question)
item_info_data <- info_matrix %>%
as_tibble() %>%
bind_cols(theta = theta) %>%
pivot_longer(cols = -theta, names_to = "item", values_to = "info_y") %>%
left_join(item_info %>% select(item = question, MATH_group), by = "item") %>%
mutate(item = fct_reorder(item, parse_number(item)))
item_info_data %>%
group_by(theta) %>%
summarise(info_y = sum(info_y)) %>%
ggplot(aes(x = theta, y = info_y)) +
geom_line() +
labs(x = "Ability", y = "Information", title = "Edinburgh MDT") +
theme_minimal()
ggsave("output/uoe_pre_info.pdf", width = 10, height = 6, units = "cm")
This shows that the information given by the test is skewed toward the lower end of the ability scale - i.e. it can give more accurate estimates of students’ ability where their ability level is slightly below the mean.
Breaking this down by question helps to highlight those questions that are most/least informative:
item_info_data %>%
ggplot(aes(x = theta, y = info_y, colour = item)) +
geom_line() +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
facet_wrap(vars(item)) +
labs(y = "Information") +
theme_minimal()
We can also compute the sums of different subsets of the information curves – here, we will look at the questions based on their MATH group:
item_info_data %>%
group_by(theta) %>%
summarise(
items_all = sum(info_y),
items_A = sum(ifelse(MATH_group == "A", info_y, 0)),
items_B = sum(ifelse(MATH_group == "B", info_y, 0))
) %>%
pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>%
mutate(items = fct_relevel(items, "all", "A", "B")) %>%
ggplot(aes(x = theta, y = info_y, colour = items)) +
geom_line() +
scale_colour_manual(values = c("all" = "#000000", MATH_colours)) +
labs(x = "Ability", y = "Information") +
theme_minimal()
ggsave("output/uoe_pre_info-curves_A-vs-B.pdf", units = "cm", width = 14, height = 6)
This shows that the information in the MATH Group B questions is at a higher point on the ability scale than for the MATH Group A questions.
Since the number of items in each case is different, we consider instead the mean information per item:
item_info_data %>%
group_by(theta) %>%
summarise(
items_all = sum(info_y) / n(),
items_A = sum(ifelse(MATH_group == "A", info_y, 0)) / sum(ifelse(MATH_group == "A", 1, 0)),
items_B = sum(ifelse(MATH_group == "B", info_y, 0)) / sum(ifelse(MATH_group == "B", 1, 0))
) %>%
pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>%
mutate(items = fct_relevel(items, "all", "A", "B")) %>%
ggplot(aes(x = theta, y = info_y, colour = items)) +
geom_line() +
scale_colour_manual(values = c("all" = "#000000", MATH_colours)) +
labs(x = "Ability", y = "Mean information per item") +
theme_minimal()
ggsave("output/uoe_pre_info-curves_A-vs-B-avg.pdf", units = "cm", width = 10, height = 6)
This shows that items of each MATH group are giving broadly similar levels of information on average, but at different points on the ability scale.
Using mirt’s areainfo function, we can find
the total area under the information curves.
info_gpcm <- areainfo(fit_gpcm, c(-4,4))
info_gpcm %>% gt()
| LowerBound | UpperBound | Info | TotalInfo | Proportion | nitems |
|---|---|---|---|---|---|
| -4 | 4 | 24.91412 | 27.42629 | 0.9084027 | 20 |
This shows that the total information in all items is 27.426295.
tidy_info <- item_info %>%
mutate(item_num = row_number()) %>%
mutate(TotalInfo = purrr::map_dbl(
item_num,
~ areainfo(fit_gpcm,
c(-4, 4),
which.items = .x) %>% pull(TotalInfo)
))
tidy_info %>%
select(-item_num) %>%
arrange(-TotalInfo) %>%
#group_by(outcome) %>%
gt() %>%
fmt_number(columns = contains("a_"), decimals = 2) %>%
fmt_number(columns = contains("b_"), decimals = 2) %>%
data_color(
columns = contains("info"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
) %>%
data_color(
columns = contains("outcome"),
colors = scales::col_factor(palette = c("viridis"), domain = NULL)
) %>%
cols_label(
TotalInfo = "Information"
)
| question | description | MATH_group | Information |
|---|---|---|---|
| A4 | completing the square | A | 2.7267354 |
| A14 | find minimum gradient of cubic | B | 2.5274266 |
| A20 | product rule with given values | B | 1.8142981 |
| A16 | trig chain rule | A | 1.7636024 |
| A17 | chain rule | A | 1.6549245 |
| A19 | area between curve and x-axis (in 2 parts) | B | 1.3764810 |
| A1 | properties of fractions | A | 1.3437311 |
| A13 | equation of tangent | A | 1.3420102 |
| A12 | find point with given slope | A | 1.3372831 |
| A7 | graphical vector sum | B | 1.3232681 |
| A6 | trig wave function | A | 1.2742625 |
| A9 | simplify logs | A | 1.2731285 |
| A10 | identify graph of rational functions | B | 1.2726651 |
| A11 | summing arithmetic progression | A | 1.1089275 |
| A3 | composition of functions | B | 1.0869852 |
| A5 | trig double angle formula | A | 1.0449474 |
| A18 | definite integral | A | 0.9998165 |
| A15 | find and classify stationary points of cubic | A | 0.9166682 |
| A2 | find intersection of lines | A | 0.7226252 |
| A8 | compute angle between 3d vectors | A | 0.5165082 |
Restricting instead to the range \(-2\leq\theta\leq2\):
tidy_info <- item_info %>%
mutate(item_num = row_number()) %>%
mutate(TotalInfo = purrr::map_dbl(
item_num,
~ areainfo(fit_gpcm,
c(-2, 2),
which.items = .x) %>% pull(Info)
))
tidy_info %>%
select(-item_num) %>%
arrange(-TotalInfo) %>%
#group_by(outcome) %>%
gt() %>%
fmt_number(columns = contains("a_"), decimals = 2) %>%
fmt_number(columns = contains("b_"), decimals = 2) %>%
data_color(
columns = contains("info"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
) %>%
data_color(
columns = contains("outcome"),
colors = scales::col_factor(palette = c("viridis"), domain = NULL)
) %>%
cols_label(
TotalInfo = "Information"
)
| question | description | MATH_group | Information |
|---|---|---|---|
| A14 | find minimum gradient of cubic | B | 2.0688682 |
| A20 | product rule with given values | B | 1.7043642 |
| A4 | completing the square | A | 1.5795642 |
| A19 | area between curve and x-axis (in 2 parts) | B | 1.2114717 |
| A16 | trig chain rule | A | 0.9757468 |
| A7 | graphical vector sum | B | 0.9572318 |
| A13 | equation of tangent | A | 0.8503176 |
| A10 | identify graph of rational functions | B | 0.8046603 |
| A3 | composition of functions | B | 0.7987483 |
| A12 | find point with given slope | A | 0.7787257 |
| A6 | trig wave function | A | 0.7728878 |
| A17 | chain rule | A | 0.6745425 |
| A18 | definite integral | A | 0.6629645 |
| A9 | simplify logs | A | 0.5935119 |
| A5 | trig double angle formula | A | 0.5302626 |
| A15 | find and classify stationary points of cubic | A | 0.4802470 |
| A1 | properties of fractions | A | 0.4683832 |
| A11 | summing arithmetic progression | A | 0.3844167 |
| A8 | compute angle between 3d vectors | A | 0.2316390 |
| A2 | find intersection of lines | A | 0.1933876 |
Since the gpcm model is more complicated, there is a
characteristic curve for each possible score on the question:
trace_data <- probtrace(fit_gpcm, theta) %>%
as_tibble() %>%
bind_cols(theta = theta) %>%
pivot_longer(cols = -theta, names_to = "level", values_to = "y") %>%
left_join(mark_levels %>% select(item, level = levelname, score), by = "level") %>%
mutate(score = as.factor(score))
trace_data %>%
mutate(item = fct_reorder(item, parse_number(item))) %>%
ggplot(aes(x = theta, y = y, colour = score)) +
geom_line() +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
facet_wrap(vars(item)) +
labs(y = "Probability of response") +
theme_minimal()
To get a simplified picture for each question, we compute the expected score at each ability level:
expected_scores <- trace_data %>%
mutate(item = fct_reorder(item, parse_number(item))) %>%
group_by(item, theta) %>%
summarise(expected_score = sum(as.double(as.character(score)) * y), .groups = "drop")
expected_scores %>%
ggplot(aes(x = theta, y = expected_score, colour = item)) +
geom_line() +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
facet_wrap(vars(item)) +
labs(y = "Expected score") +
theme_minimal()
The resulting curves look quite similar to those from the 2PL, allowing for some similar interpretation. For instance, superimposing all the curves shows that there is a spread of difficulties (i.e. thetas where the expected score is 2.5/5) and that some questions are more discriminating than others (i.e. steeper slopes):
plt <- expected_scores %>%
ggplot(aes(x = theta, y = expected_score, colour = item, text = item)) +
geom_line() +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
labs(y = "Expected score") +
theme_minimal()
ggplotly(plt, tooltip = "text")
ggsave(plot = plt, file = "output/uoe_pre_iccs-superimposed.pdf", width = 20, height = 14, units = "cm")
highlight_removed_items <- expected_scores %>%
mutate(highlight_item = item %in% c("A2", "A8", "A11")) %>%
mutate(line_width = ifelse(highlight_item, 1, 0.5))
highlight_removed_items %>%
ggplot(aes(x = theta, y = expected_score, colour = item, text = item, alpha = highlight_item)) +
geom_line(aes(size = highlight_item)) +
#geom_point(data = highlight_removed_items %>% filter(highlight_item == TRUE, theta == 0)) +
ggrepel::geom_label_repel(
data = highlight_removed_items %>% filter(highlight_item == TRUE, theta == 0),
aes(label = item),
box.padding = 0,
show.legend = FALSE
) +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
scale_size_manual(values = c("FALSE" = 0.6, "TRUE" = 0.9), guide = "none") +
scale_alpha_discrete(guide = "none", range = c(0.2, 1)) +
labs(x = "Ability", y = "Expected score") +
theme_minimal() +
theme(legend.position="bottom",#legend.title=element_blank(),
legend.margin = margin(0, 0, 0, 0),
legend.spacing.x = unit(1, "mm"),
legend.spacing.y = unit(0, "mm")) +
guides(colour = guide_legend(nrow = 2))
## Warning: Using alpha for a discrete variable is not advised.
ggsave(file = "output/uoe_pre_iccs-highlight.pdf", width = 16, height = 10, units = "cm")
total_expected_score <- expected_scores %>%
group_by(theta) %>%
summarise(expected_score = sum(expected_score))
total_expected_score %>%
ggplot(aes(x = theta, y = expected_score)) +
geom_line() +
# geom_point(data = total_expected_score %>% filter(theta == 0)) +
# ggrepel::geom_label_repel(data = total_expected_score %>% filter(theta == 0), aes(label = round(expected_score, 1)), box.padding = 0.5) +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
labs(y = "Expected score") +
theme_minimal()
course_results <- read_csv("data-uoe/ANON_2013-2017_course-results.csv", col_types = "ccddddddddd")
course_results_long <- course_results %>%
pivot_longer(cols = !c(AnonID, year), names_to = "course", values_to = "mark") %>%
filter(!is.na(mark)) %>%
separate(course, into = c("course_type", "course"), sep = "_") %>%
mutate(year = str_replace(year, "/", "-1"))
course_results_vs_diagtest <- course_results_long %>%
left_join(test_scores_with_ability %>% select(AnonID, year, diagtest_score = Total, F1), by = c("AnonID", "year")) %>%
filter(!is.na(diagtest_score))
We have both course results and diagnostic test scores for the following number of students:
course_results_vs_diagtest %>%
select(AnonID, year) %>%
distinct() %>%
tally()
## # A tibble: 1 × 1
## n
## <int>
## 1 2509
Mathematics students take linear algebra (ILA) in semester 1, then calculus (CAP) and a proofs course (PPS) in semester 2.
course_results_vs_diagtest %>%
filter(course_type == "spec") %>%
janitor::tabyl(year, course) %>%
gt()
| year | CAP | ILA | PPS |
|---|---|---|---|
| 2013-14 | 286 | 302 | 170 |
| 2014-15 | 313 | 347 | 177 |
| 2015-16 | 236 | 257 | 168 |
| 2016-17 | 280 | 300 | 179 |
course_results_vs_diagtest %>%
filter(course_type == "spec") %>%
mutate(course = fct_relevel(course, "ILA", "CAP", "PPS")) %>%
ggplot(aes(x = diagtest_score, y = mark)) +
geom_point(size = 1, stroke = 0) +
geom_smooth(method = lm, formula = "y ~ x") +
ggpubr::stat_cor(label.y = 105, p.accuracy = 0.001) +
facet_grid(cols = vars(course)) +
theme_minimal() +
theme(strip.text.x = element_text(size = 12)) +
labs(x = "Edinburgh MDT score", y = "Course result")
ggsave("output/uoe_pre_regression-spec.pdf", units = "cm", width = 16, height = 8)
This shows that the diagnostic test is moderately predictive of success in year 1.
On the non-specialist side, students take the following courses:
The courses come in two parts: 1a in semester 1, and 1b in semester 2.
course_results_vs_diagtest %>%
filter(course_type == "nonspec") %>%
janitor::tabyl(year, course) %>%
gt()
| year | EM1a | EM1b | MNS1a | MNS1b | MSE1a | MSE1b |
|---|---|---|---|---|---|---|
| 2013-14 | 0 | 0 | 0 | 0 | 316 | 310 |
| 2014-15 | 0 | 0 | 0 | 0 | 364 | 351 |
| 2015-16 | 231 | 215 | 61 | 57 | 0 | 0 |
| 2016-17 | 222 | 223 | 63 | 59 | 0 | 0 |
course_results_vs_diagtest %>%
filter(course_type == "nonspec") %>%
separate(course, into = c("course", "semester"), sep = "\\d") %>%
mutate(semester = ifelse(semester == "a", "Semester 1", "Semester 2")) %>%
mutate(course = fct_relevel(course, "MSE", "EM", "MNS")) %>%
ggplot(aes(x = diagtest_score, y = mark)) +
geom_point(size = 1, stroke = 0) +
geom_smooth(method = lm, formula = "y ~ x") +
ggpubr::stat_cor(label.y = 105, p.accuracy = 0.001) +
facet_grid(rows = vars(semester), cols = vars(course)) +
theme_minimal() +
theme(strip.text.x = element_text(size = 12)) +
labs(x = "Edinburgh MDT score", y = "Course result")
ggsave("output/uoe_pre_regression-nonspec.pdf", units = "cm", width = 16, height = 16)
We see a similar pattern of the diagnostic test being moderately predictive of Semester 1 results, and still somewhat predictive in Semester 2.
This report supports the analysis in the following paper:
[citation needed]
In this analysis we used the following packages. You can learn more about each one by clicking on the links below.